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Extreme codon bias is seen for the Sttccharomyc^ 
cerevisiae genes for the fermentative alcohol dehydro* 
genase isozyme i (ADH-I) and ^yceraldehyde-3-phos- 
phate dehydrogenase. Over 96% of the 1004 amino acid 
residues analyzed by DNA sequencing are coded for by 
a select 26 of the 61 possible coding triplets. These 
preferred codons tend to be highly homologous to the 
anticodons of the mi^or yeast isoacceptor tRNA spe* 
ci s. Codons which necessitate side by side GC base 
pairs between the codons and the tRNA anticodons are 
always avoided whenever possible. Codons containing 
100% G, C, A, U> GC, or AU are also avoided. This 
provides for approximately equivalent ccdon-antico- 
don binding energies for all preferred triplets. All se- 
quenced yeast genes show a distinct preference for 
these same 25 codons. The degree of preference varies 
from greater than 90% for glyceraldehyde*3-phosphate 
dehydrogenase and ADH-I to less than 20% for iso-2 
cytochrome c The degree of bias for these 25 preferred 
triplets in each gene is correlated with the level of its 
mRNA in the cytoplasm. Genes which are strongly 
expressed are more biased than genes with a lower 
level of expression. 

A similar phenomenon is observed in the codon pref- 
erences of highly expressed genes in EschericPua coiL 
High levels of gene expression are weU correlated with 
high levels of codon bias toward 22 of the 61 coding 
triplets. As in yeast, these preferred codons are highly 
complementary to the ni^jor cellular isoacceptor tRNA 
species. In at least four cases (Ala, Arg, Leu, and Val), 
these preferred E* coii codons are incompatible with 
th preferred yeast codons. 



Recent advances in nucleic acid analysis in general and in 
DNA cloning and sequencing in particular have made availa- 
ble a great deal of data on the primaiy structure of several 
viral, prokaiyotic, and eukaryottc genes. The DNA or RNA 
from several mRNA*coding genes has been sequenced (com- 
piled by Grantham and co-workers (1, 2)). Tlie results ob- 
tain d agree well with the triplet codon: amino acid assign- 
ments which had been determined by indirect means (see 
(1966) Cold Spring Harbor Symp. Quant, Biol 31.). 

Most, if not all, mRNA-coding genes show a bias, sometimes 
subtle, but always statistically significant, in the choice of 
which of several degenerate triplets are used to code for a 
particular amino acid. Different genes exhibit different pat- 
terns of nonrandovn codon usage, but different genes in the 

* The costs of publication of this article were defrayed in part by 
the payment of page charges. This article must therefore be hereby 
marked ''advertisement" in accordance with 18 U.S.C. Section 1734 
solely to indicate this fact 

t Present address, Department of Biological Sciences, Stanford 
University, Stanford, California 94305. 



same genome frequently have related codon preference rules 
(2). A msjor difiEiciilty in assessing which factor(s) is involved 
in selection by an organism of specific codon biases is the 
complex pattern of nonrandom codon utilization generally 
obs^ed Analyses by others suggest that no single, oveiriding 
selection process is responsible for the preferences in codon 
usage detected. This is not suip rising since a variety of con- 
straints beyond coding for a specific peptide may act on p.n 
mRNA sequence. The codon biases observed in a mature 
mRNA primary sequence may be a function of selective 
preferences acting on mRNA processirtr and transport, 
mRNA translation e£&ciency, and/or mRNA secondary struc- 
ture and stability. In addition, specific triplet preferences may 
be a reflection of selective processes basically independent of 
mRNA structure and function, acting instead on the DNA 
sequence which the mRNA miiTors. Tliese factors mi^t 
include susceptibility of the DNA to mutagenic damage and 
cues for DNA replication, chromatin assemLiy, or even RNA 
polymerase promotion and termination (3, 4). To date, most 
of the modiils proposed for the basis of nonrandom codon 
usage have priniarily involved various aspects of mRNA struc- 
ture and translational efficiency (1, 2, 5-11). 

Recent sequencing of the yeast genes coding for the veiy 
abundant proteins glyceraldehyde-3-phosphate dehydrogen- 
ase (12, 13) and alcohol dehydrogenase isoa^yme I (14) has 
disclosed three cases of codon selection far more strict than 
any yet sf«n. We have analyzed these sequences and compared 
them to the codon usage observed for several other sequenced 
genes fi'om Saccharomyces cerevisiae. These data suggest 
that in bakers' yeast a common selective mechanism acts to 
heavily bias codon representation in the genes for ADH*I' and 
glyceraldehyde-3-phosphate dehydrogenase. 

RESULTS AND DISCUSSION 

Codon Usage in Six Yeast Genes— The published DNA 
sequence data for several yeast genes make it possible in each 
case to accurately determine both the amino acid sequence of 
the corresponding protein and the codon oipLi used to code 
for each of its amino acids. Collation and summation of the 
codons iised for each of seven yeast proteins yields the codon 
utilization summary in Table I. For two of these proteins, 
glyceraldehyde-3-phosphate dehydrogenase and alcohol de- 
hydrogenase I, the usage of the 61 possible codon triplets is 
lughly biased For the two glyceraldehyde-3-phosphate de- 
hydrogenase genes, only 29 codons are used and for ADH-I, 
only 33. In contrast with these two genes, those which codo 
for yeast iso-1 and iso-2 cytochrome c employ 41 and 43 
different codons, respectively. 

The great similarities between the yeast glyceraldehyde-3- 
phosphate dehydrogenase and ADH-I genes both in the de- 
gree and direction of their codon preferences suggest that this 

*The abbreviation used is: ADH-I, alcohol dehydrogenase iso- 
s^yme L 
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usage pattern has fiincti nal significance and is not merely a 
statistical anomaly. Each ne of the 28 cod ns no/ used in the 
ADH-I gene is also n t used in either f the two genes coding 
for glyceraldehyde-3-phosphatc dehydrogenase (Table I). It is 
evident, from the data in Table I and from more detailed 
comparisons made below, that cod n usage f r the yeast 
H2B1, H2B2, CYCl, and CYC7 genes is biased in tiiis same 
direction, but to a lesser degree. 

Most of the codon preferences manifested by the glyceral- 
dehyde*3-phosphate dehydrogenase and ADH-I genes can be 
summarized by the following empirical rules. 

1. For serine (UCN) and for four of the six amino acids 

u 

which have 3- or 4-fold coding degeneracy (XY^ ), the codons 

G 

XYC and XYU are used with roTughly equal probability, 
whereas the codons XYA and XYG are never used. Serine, 
isoleudne, valine, threonine, and alanine follow this rule; 
proline and ^ycine do not. 

2. For 2-fold degenerate codons with a pyrimidine in the 
wobble position, XYC is used and XYU is not used. This 
holds absolutely for phenylalanine, tyrosine, histidine, and 
asparagine, while aspartic add has a 2-fold preference for C 
over U. This rule does not hold for cysteine as is discussed 
below. 

3. For leucine (UUR), arginine (AGR), and for 2-fold degen- 
erate codons with a purine in the third position, one of the 
alternative codons is used almost to the complete exclusion of 
the other. (UUG for Leu, AAG for Lys, CAA for Ghk, GAA for 
Glu, and AGA for Arg). 

4. For the two 4-fold degenerate amino acids (Gly, Pro) 
which do not obey the first rule, the predominant codon 
choices (CCA for Pro and GGU for G^^) are Uiose which 
prevent the codon from being either 100% GC, 100% purine, 
or 100% pyrimidine. 

Codon Usage versus Isoaccepting tRNA Abundance^A 
f rmal explanation for these empirical rules of codon prefer- 
ence can be found in liie relative abundances of different 
isoaccepting yeast tRNAs and in their anticodon sequences. 
For each of the 16 amino acids whose tRNAs have Seen 
sequenced, the major isoaccepting species present in yeasc is, 
in fact, that with an anticodon allowing it to translate the 
most frequently used codon (or XYc codon pair) for that 
amino add (Table 11). Of the amino adds, induding methio- 



nin and tryptophan, for which a sin^ cudon is used exten- 
sively r pred minantly in 0yceraldehyde-3-phosphate de- 
hydrogenase and ADH-I, nine have a maj t tRNA whose 
anticod nis xactly complementary to that anticodon triplet 
Th tw xcepti ns are ^ycine and cysteine. Thus, it would 
appear that wh n the major tRNA f r an amino acid has as 
its wobble base either U, C, or G, there is selection for those 
codons in mRNA that can form a standard Watson-Crick base 
pair at the third position and against the alternative codon 
which would require wobble pairing. A different restriction in 
the use of wobble pairing (23) is seen for those migor yeast 
tRNAs which have an inosine residue at anticodon position 
one (Table 11, lines 1-4). Each of these corresponds to an 
amino acid coded both in glyceraldehyde-3-phosphate dehy- 
drogenase and in ADH-I exdusively by pyrimidine-ended 
codons and never by the related purine-ei»ded codons. This 
absolute correlation suggests that I-C and I-U base pairs are 
favored in codon-anticodon interaction and that I-A base 
pairs, although theoretically possible (23), may not actually 
occur. Consistent with this condusion is the observation that 
none of the 11 yeast serine tRNA genes (24) which code for 
tRNV^i (anticodon ^ IGA) have been found to be mutable 
to effident ochre (UAA) suppressors (25). Furtiier evidence 
for the absence of I-A base pairing is the existence in yeast of 
a separate UCA-decoding minor serine isoacoeptor (26). UCG 
codons apparentiy are translated by yet another minor serine 
tRNA (26). In keeping with the paudty of these latter two 
tRNAs, UCA and UCG are not used to code for serine in yeast 
glyceraldehyde-3-phosphate dehydrc 3anane and ADH-L 

Codon-Anticodon Interactions— -The two exceptions to the 
rule of strict complementarity between codon and anticodon 
of the major tRNA are cysteine (UGU)-antioodon GCA and 
glycine (GGU)-anticodon GCC. The choice of U rather than 
C in each of these cases may be explained by a strong avoid- 
ance of side by side GC base pairs in yeast codon-anticodon 
interactions (see below). With U having been chosen for codon 
position three, a G-U or I-U pair is made inevitable because 
A residues are never found in yeast tRNAs at the wobble base 
position. Adenosine residues specified for this position by the 
gene sequence are deaminated during tRNA maturation to 
form inosine residues. Therefore, the presence of U for posi- 
tion three in Cys and Gly codons predudes perfect comple- 
mentarity because A at the corresponding anticodon position 
is unattainable. 

Given the correlation noted between codons firequently used 



Table 1 
Onion usage for eight yeast genes 

The table lists the number of times each triplet appears in the plus strand of the DNA sequence coding for each of the following yeast 
proteins: 49. dyceraldehyde-d-phosphate dehydrogenase done pgap491 (12, 13); 63, glyceraldehyde-3-phosphate dehydrogmase clone pgap63 
(13)' AD, alcohol dehydrogenase I (14); Bl. liistone H2B, gene H2B1 (15); B2, histone H2B. gene H2B2 (16); 01. iso-l-<?ytochrome c (16); CZ, 





49 


63 


AO 


Bl B2 CI C2 




49 


63 


AD 


Bl 


B2 


01 C2 




49 


63 


AD 


Bl 


B2 


CI 


C2 




49 


63 


AD 


Bl 


B2 


CI 


C2 


uuu 


0 


0 


0 


1 


1 


2 


3 


UCU 


13 


11 


14 


11 


9 


2 


2 


UAU 


0 


0 


0 


1 


3 


2 


3 


UGU 


2 


2 


8 


0 


0 


2 


2 


UUG 


10 


11 


8 


1 


1 


2 


1 


ucc 


12 


14 


7 


6 


9 


0 


1 


UAC 


10 


10 


13 


4 


2 


3 


2 


VGC 


0 


0 


0 


0 


0 


1 


C 


UUA 


0 


0 


2 


1 


1 


1 


2 


UCA 


0 


0 


0 


0 


1 


1 


2 


















UGG 
















UUG 


21 


20 


19 


5 


5 


5 


3 


UCG 


0 


0 


0 


0 


0 


1 


0 


















3 


3 


5 


0 


0 


1 


1 


CUU 


0 


0 


0 


0 


0 


1 


0 


ecu 


0 


1 


2 


1 


1 


I 


3 


CAU 


0 


0 


1 


1 


1 


2 


3 


CGV 


0 


0 


0 


0 


0 


0 


0 


cue 


0 


0 


0 


0 


0 


0 


0 


CCJC 


0 


0 


I 


0 


0 


0 


0 


CAO 


8 


8 


10 


1 


1 


2 


0 


CGC 


0 


0 


0 


0 


0 


0 


0 


CUA 


0 


1 


3 


0 


0 


1 


0 


CCA 


12 


10 


10 


4 


4 


3 


2 


CAA 


5 


6 


9 


4 


3 


2 


0 


CGA 


0 


C 


0 


c 


0 


0 


0 


CUG 


c 


0 


0 


0 


0 


0 


0 


CCG 


0 


0 


0 


0 


0 


0 


0 


GAG 


0 


0 


0 


0 


1 


0 


3 


CGG 


0 


0 


0 


0 


0 


0 


0 


AUU 


9 


7 


9 


3 


4 


2 


3 


ACU 


12 


10 


5 


9 


6 


3 


1 


AAU 


0 


0 


0 


0 


0 


2 


1 


AGU 


0 


0 


0 


0 


0 


0 


2 


AUG 


11 


12 


12 


5 


4 


2 


1 


ACC 


12 


13 


9 


2 


3 


3 


1 


AAC 


12 


14 


11 


3 


3 


5 


6 


AGC 


0 


0 


0 


1 


0 


0 


0 


AUA 


0 


0 


0 


0 


0 


0 


1 


ACA 


0 


0 


0 


1 


2 


2 


3 


AAA 


1 


2 


4 


7 


8 


6 


9 


AGA 


11 


11 


8 


6 


5 


3 


2 


AUG 


6 


7 


7 


1 


1 


3 


3 


ACG 


^0 


0 


0 


0 


0 


0 


4 


AAG 


25 


24 


20 


13 


11 


10 


8 


AGG 


0 


0 


0 


0 


1 


0 


1 


GUU 


22 


23 


19 


4 


5 


1 


1 


GCU 


■26 


23 


:9 


12 


10 


3 


4 


GAU 


8 


7 


2 


-2 


2 


1 


4 


(3GU 


25 


24 


41 


4 


4 


8 


8 


GUG 


15 


12 


17 


1 


2 


0 


1 


GCC 


6 


10 


16 


4 


7 


4 


1 


GAC 


17 


13 


14 


1 


1 


3 


1 


GGC 


0 


0 


3 


0 


0 


2 


1 


GUA 


0 


0 


0 


0 


0 


0 


1 


GCA 


0 


0 


0 


1 


0 


0 


2 


GAA 


14 


15 


20 


8 


7 


5 


3 


GGA 


0 


0 


0 


0 


0 


0 


3 


GUG 


0 


0 


0 


0 


0 


2 


0 


GCG 


0 


0 


0 


1 


0 


0 


1 


GAG 


0 


0 


0 


0 


1 


2 


3 


GGG 


0 


0 


0 


0 


0 


2 


0 
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Tablb II 

Preferred codma in the ADH-I and glyceratdehyde*3^phosphate 
dehydrogenase (Q3PDH} g^nea in relationship to antieodon 
sequences in the ms^or yeast isoaceepting tRNAs 
All nucleic acid sequences listed are written 6'-NNN-3'. The un- 
derlined nucleotides are those which base pair between antieodon 
position one and codon position three. Transfer RNA antieodon 
sequences are derived from the compilation of Gauss et aL (18). 
Determination of which of two or more isoaceepting tRNA species is 
the miyor one in yeast was done using the BD cellulose column 
chromatographic profiles of Gillam et aL (19) cuid the Sepharose 4B 
chromatographic experiments of Culbertson et aL (20). The m^or 
aiKinine isoacceptor, tRNAftP. was purified and sequenced bvKunzel 
and co-workere (21. 22). For tRNA«-, tRNA^^. tRNA*>. and 
tRNA**^ only one isoaceepting species is known. Gm is 2'O-methyl- 
guanosine; ^ is pseudouridine. Cm is 2'0-ntethylcytidine, and U* is 
believed to be a uridine derivative (18). The codon usage figures are 
the sum of the listed values for the two glyceraldehyde-3-pho^hate 
dehydrogenase genes (12, 13) and ADH-1 gene (14) given in Table I. 



Amino 
acid 


M^or isoacceptor and 
its antieodon sequence 


Preferred triplet(8) and the 
extent of their use in 
G3PDH + ADH-I 


I Too nf 

use OI 
all Other 
triplets 
for that 
amino 
acid 


Ala 


tRNAf/GC 


GCl/(68), GCC (32) 


0 


Scr 


*RNAf'/GA 


VCUm, UCC (33) 


0 


Thr 


tRNAl^ib/GU 


ACC/(77).ACC (34) 


0 


Val 


tRNAy»/AC 


GUI/ (64). GUC (44) 


0 


De" 


None sequenced 


AUt/(25),AUC (35) 


0 


Asp 


tRNAf^ GUC 


GAC (44) 


17 


Phe 


tRNA'^'GoAA 


UUC(^) 


0 


TjT 


tP^A"^ GUA 


UAC (33) 


0 


Qys 


tRNA^GCA 


UGt;(12) 


0 


Asn 


None sequenced 


AAC (37) 


0 


His 


None sequenced 


CAC(26) 


1 


Arg 


tRNA*^l/*CU 


AGi4(30) 


0 


Glu 


tRNA?'" UVC 


GAi4 (49) 


0 


Leu 


tRNASr CAA 


UUG (GO) 


6 


Lys 


tRNAi** CXJU 


AAG (€9) 


7 


Gly 


tRNAf^'OCC 


GGU (90) 


3 


Gin 


None sequenced 


CAA (20) 


0 


Pro 


None sequenced 


CCA (32) 


4 


Met 


tRNAr^ CAU 


AUG (20) 




Tip 


tRNA"^'' C«CA 


UGG (11) 





** The most abundant isoleucine tRNA species in the yeast Tom- 
hpsis utilia has the antieodon 6'-/AU-3' (18). 



and tRNAs present most abundantly, there remains the ques- 
tion of why natural selection has favored these particular 
codons and tRNAs rather than others. Why, for example, are 
CGN arginine codons not used at aU in either glyceraldehyde- 
3-phosphate dehydrogenase or ADH-I? This is but one man- 
ifestation of a general tendency not to use codons containing 
GC, Cu, CO, or GG if this can be in any way avoided. The 
codons UGG, CCG, ACG, GCA, GCG, CGU, CGC. CGA. 
CGG, AGG, GGG, AGC^. GGC, and CCC are totally absent 
from the mRNAs for ADH-I and glyceraldehyde-3-phosphate 
dehydrogenase. For each of these triplets, perfectly comple- 
mentazy codon-anticodon binding would entail side by side 
GC base pairs. This situation is always avoided, whenever 
possible, in tiie ADH-I and glyceraldehyde-S-phosphate de- 
hydrogenase genes. Since only CCN triplets can code for 
proline, side by side GC base pairs are unavoidable. However, 
all yeast genes which have be n sequenced contain primarily 
CCA (or CCU) proline cod ns (Table I) and avoid CCC and 
CCG triplets greater than 99% of the time. Although UCC 
serine. ACC threonine, and GCC alanine cod ns occur fre- 
quently in the ADH-I and glyceraldehyde-3-phosphate dehy- 
drogenase genes, the presence of inosine as the wobble nu- 



cleotide ui the tRNAs that recognize these codons (Table 11) 
avoids the formation of side by side GC base pairs in the 
cod n-anticod n interaction. A possible functi nal basis for 
the avoidance of side by side GC base pairs is contained b the 
analysis f c don-anticod n binding strengUi made by Gros- 
jean et aL (27). These authors noted a tend ncy for dififerent 
codon-anticodon pairs to have relatively similar binding con- 
stants. Higher binding constants than average would obtain if 
two GC base pairs were involved. Hence, the bias against GC, 
CG, CC, and GG-containing codons in yeast equalizes triplet- 
anticodon interactions by such codon choices as AGA (Arg) 
rather than CGN and GGU (Gly) rather than GGG or GGC. 
In other words, these and other sunilar codon choices which 
discriminate against side by side GC base pairs can have the 
effect of smoothing out the differences in codon-anticodon 
binding strength for different amino acids. Further examples 
of this "binding energy homeostasis" are evidenced by the 
predominant codon choices for leucine (UUG), tyrosin 
(UAC), lysine (AAG), and odiers which are made in such a 
way as to preclude codon-anticodon binding between se- 
quences which are 100% A -I- U. 

The results presented here suggest that the codons in the 
glyceraldehyde-3-phosphate dehydrogenase and ADH-I genes 
have evolved to produce optimal and uniform codon-antico- 
don binding energies with the most abundant isoacceptor 
tRNAs in the cell Various other authors have discussed an 
expected relationship between tRNA availability (1, 7, 8, 11, 
28, 29) and codon-anticodon interactions (6, 9) to explain 
nonrandom codon usage. Both phenomena seem to be in- 
volved for the ^yceraldehyde-3-phosphate dehydrogenase 
and ADH-I genes. Independently of one another, these highly 
expressed genes have evolved coding sequences which opti- 
mize interaction with the most abundant tRNA molecules. 
Other types of selective pressures on codon usage must be less 
significant for these two yeast genes. 

The results on yeast codon selection and the correlation 
with tRNA abundance and codon-anticodon binding efficiency 
presented here allow prediction of the antieodon sequences of 
various, as yet unsequenced, S. cereviaiae, tRNAs. F r ni- 
stance, the mi^or tRNA"" isoacceptor should have a 5'-IAU- 
3' antieodon, the nu^or tRNA^ species should have a UGG 
(or *UGG) antieodon, and the most abundant tRNA^*" sh uld 
have a UUG antieodon. 

Codon Bias and mRNA Abundances— The data for the 
codon selection in ADH-I and glyceraldehyde-3-phosphate 
dehydrogenase point quite directly to a preferred set of 25 out 
of the 61 possible tripleta For 15 amino adds on^ one triplet 
is preferred, while for Thr, Ala, Be, Val, and Ser either of two 
codons, one having a U and the other a C in the wobble 
position, is allowed. If we term these 25 codons the '^preferred 
triplets," it is possible to measure the distance between any 
given mRNA sequence and the "preferred" mRNA sequence 
that could code for that particular protein. Table III presents 
such an analysis of codon bias for six sequenced mRNA-coding 
S. cerevisiae genes. The Codon Bias Index is a measure of the 
fraction of codon choices which is biased to 22 preferred 
triplets. A value of one indicates that for all of Uie tr^)lets in 
the mRNA, only codons of the preferred variety are used. A 
value of zero indicates totally random choice. A Codon Bias 
Index significantily less tt im zero for a given gene indicates 
greater Uian random usage f the nonpreferred triplets. In 
calculating tbe Codon Bias Index for yeast proteins, we have 
excluded cod nsf r methionine, tryptophan, and aspartic acid 
(the latter xhibits a degree f cod n bias less than the 85% 
cutoif P'bitrarily chosen to define a "preferred" triplet). 

The bserved values of the C don Bias Index are very 
nearly 1.0 f r both glyc raldehyde-3-phosphate dehydrogen- 
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Tablb in 

Tiie Codon Bias Index and approximate cellular mRNA levels for 
eight yeast genes 
The Codon Bias Index is a fraction whose numerator is the total 
number of times that the preferred codons are used in the protein 
minus the number of such usages expected if the code were read 
randomly. The denominator is the total number of amino add resi- 
dues in the protem (excluding methioninet tryptophan, and aspartic 
acid residue) mmus the random expectation for usage of the prefened 
codons. The latter quantity, which appears both in the numerator 
and denominator, is a sum of 17 products, each omr being the number 
of residues in the protein of a given amino acid multiplied by the 
fraction of all codons for that amino add in the genetic code dictionary 
which are **preferred** (1/6 for Leu and Arg; 2/3 for He; and 1/2 for 
Phe, etc,), llie preferred codons were chosen as those that were used 
greater than 85% of the time in the ADH-I and ^yceraldehyde-3- 
phosphate dehydrogenase (G3PDH) genes. By this criterion, only 
aspartic add was disqualified as not having significant codon bias 
(GAC/GAU « 44/17). DaU for Met and IVp are also not induded as 
these amino adds have no degenerate codons. The ideal codons 
chosen were UUC, UUG. AUU, AUG, GUU, GUC, UCU. UCC, CCA. 
ACU, ACC, GCU, (SCO. UAC, CAC, CAA, AAC, AAG, GAA. UGU, 
AGA, and GGU. The approximate cellular mRNA concentrations 
were determined by wheat germ translation for glyceraldehyde-3- 
phospbate dehydrogenase, ADH-I and enolase,' and iso-1 cytochrome 
c (30). The level of histone 2B mRNA was determiued by L. Hereford.'* 
The value for the iso-2 cytochrome c level was approximated by 
dividing the iso-1 cytochrome c mRNA level by 17. It is known that 
the ratio of iso-1 to tBo-2 cytochrome c protein is about 17 to 1 (31). 
The codon biases of the two yeast enolase genes (peno 8 and peno 46) 
were calculated from the data of Holland et aL (32). 



Uene 


Codon BUis 
Index 


Appfoximate ^ of 
total ceUula.' 
mRNA 


pg^ 63 (G3PDH) 


0^ 


1.5-6 


pgap491 (G3PDH) 


0.98 




peno 8 


0.96 


1-3 


peno 46 


0.93 




ADH-I 


0.92 


0.7-2 


Histone 2B 






(H2B1) 


0.75 


0.4 


(H2B2) 


0.68 


0.4 


Iso-1 cytochrome c 


0.&0 


0.05 


iso-2 cytochrome c 


0.15 


0.003 



ase genes and 0.92 for ADH-I, in keeping with the de)Snitions 
of preferred triplets as those most used in these genes. For the 
two histone 2B genes, the values are significantly lower, and 
for the iso-l-cytochrome c gene, the codon bias is only 0.5. 
For iso-2-cytochrome c, die choice is nearly random as regards 
preferred versus nonprefenred triplets. In all cases where the 
intracellular level of mRNA or protein is known, thb corre- 
lates well with the degree of codon bias (Table HI). The most 
abundant protein (glyceraldehyde-3-phosphate dehydrogen- 
ase) is most biased, the least abundant protein is least biased, 
and the proteins of intermediate abundance fall into place as 
well. 

Physiological Basis for Biased Codon l/so^e— Our analy- 
sis of codon usage for yeast nuclear genes has led to two m^or 
conclusions. First, the codon which is most frequently used to 
specify a given amino acid is, in nearly every case, exactly 
complementary to the anticodon sequence of the most abun- 
dant isoaccepting tRNA for that amino acid. Consequently, 
there is a limited set of codon triplets, 25 in all, which can be 
defined as "preferred cod ns." Second, for all yeast proteins, 
the preferred set is the same one. H wever, there is consid- 
erable variation in the extent to which any given protein uses 
nly preferred c dons r instead draws indiscriminately fit>m 
the entire set of 61 sense codons. There is a remarkably strong 

^ J. Bemietzen, C. Denis, and E. T. Young, unpublished results. 
' L. Hereford, personal communication. 



coirelation (Table m) between th eiteot of oodon usage Uas 
toward these 25 preferrai codons and the level of a particular 
protein (and its mRNA) hi yeast cdia Because a 
situation eiists in Escherichia eoU cells (see bebw)» tlim 
would appear to be some general ph^yslolcgical reason iriiy 
abimdant proteins have biased codon usage and tare proteins 
do not 

Tlie explanation which comes to mind most immediately 
involves the translation rate of mRNA for abundant proteins. 
Because these pn>tein8 are required at high intiaoeUular leveb 
(which we assume simply because diey are abundant). It could 
be presumed that there Is selective pressure to translate their 
mRNAs rapidly and repetitively. Because the concentiation 
of charged cognate tRNA governs tibe step time required to 
add an amino add opposite each codon, rapid translation is 
favored by ^e use of triplets for abundant tRNAs. Conse- 
quently, ^synonymous" mutations from tr^lets for hj^ abun- 
dance tRNAs to triplets for low abundance isoacceptois mi^ 
be deleterious if they were to occur in genes for proteins 
required in high abimdance. The continuing sdection for a 
hi^ output of that particular protein will act to retain a set 
of preferred codons within the gene. Con/exsety, for those 
proteins, such as iso-2-cytochroxne c, vidiich are not requned 
in large amounts, speed of translation has little selective value. 
Consequently, synonymous mutations u> trqilets decoded by 
low abundance tRNAs would not be stron^y selected against 
in the genes for these proteins and their pattern of codon 
^isage would be more neariy random, as is die case. 

Certain questions are raised by the hypothec that evolu- 
tionary selection for a hi^ rate of tzans'iation of hi^y ex- 
pressed yeast genes is respon^le for their bias toward codons 
which match the most abundant tRNA isoacceptors. Why 
couldn't the same high output of protein be achieved in other 
ways, by DN A changes which increase the rate of transcrq>- 
tion, enhance the stability of mRNA, or augment the number 
of gene copies (13)? Optimization of each of these genetic 
parameters mi^t equally well have served to provide a hi^ 
output of gene product. 

Another hypoUiesis to eiqilain codon isage bias for lu^y 
expressed genes postulates a more pervasive deleterious ^ect, 
should codons corre^ndlng to rare tRNA isoacceptors be 
used in mRNAs of high abundance. Consider what the effects 
would be if serine codon UCG, corresponding to a rare isoac- 
cepting tRNA^ (26), were used extensively in coding for an 
abundant yeast piotein such as glyceralddiyde-3-phosphate 
dehydrogenfl^^e. Because of the low abundance of 
tRNAflcGt the intracellular pool of aminoacylated 
tRNAuSc must be small relative to that for other tRNA& 
When die hypothetical yeast strain grows on ^ucose medhnn, 
Fynthesizing large amounts of ^yceraldehyde-3-phosphate de- 
hydrogenase mRNA, translation of this RNA would draw 
heavily upon the pool of Ser-tRNi€oG> disdiaiging a large 
fraction of the molecules. Consequendy, all yeast libosom^ 
at that moment translating mRNAs containing UCG cod ns 
could suffer a block in translation with consequent risk of 
early termination and/or trFTislational error (33). Thus, the 
presence of UCG codons in highly expressed genes such as 
that for ^ceraIdehyde-3-phosphate dehydrogenase will have 
deleterious effects upon mai^ intracellular targets. On an 
evolutionary time seal , tiiese multq>l effects could be elim- 
inated by single base pair changes (UCG — » UCc) at tiiird 
position site within the glyceraldehyde-3-phosphate dehydro- 
genase gene. On the other hand, the occasional usage of 
tRN A8Sg by hundreds f genes making proteins of medhun 
or low abundance (such as iso-l-cytochrome c) brings abmit 
littie discharging of this rare isoaccepting tRNA. Conse- 
quently, there is no strong selective pressure against UCG 



3030 



Codon Selection in Yeast 



codons in mRNAs coding f r this class of proteins. The 
example just given can be generalized as an explanati n for 
biased cod n usage for aU amino acids having multtpl isoac- 
cepting tRNAs. The bias bserved f r phenylalanine (UUC), 
tyrosine (UAC), and cysteine (UGU) cod n usage has some 
other basis since only one tRNA species is present for each f 

these amino acids. 

If die selective mechanisms we have proposed are partly or 
wholly responsible for the differences in codon usage between 
abundant and rare yeast proteins, this iraplies that effectively 
different genetic codes are operative for the two classes of 
proteins. A set of 20 tRNAs, efficiently recognizing 26 codons, 
translates nearly all of the coding sequences of tiie abundant 
yeast proteins. In addition to these nujor tRNAs, approxi- 
mately 20 minor isoaccepting tRNAs are required tor trans- 
lation of minor yeast proteins. These yeast tRNAs are spe- 
cialized for that function. 

In proposing two different (and not mutually exclusive) 
physiological explanations for the observed pattern of yeast 
codon bias, we have assumed that a deviation from that 
pattern woidd result in lowered fitness for the yeast cell. This 
assumption can be tested and the nature of any deleterious 
effect identified by altering a highly expressed yeast gene in 
vitrot creating an inappropriate codon, and then reintroducing 
the mutated gene into yeast cells and testing them for physi- 
logicai changes resulting from the mutation. 

Comparison of Yeast Codon Usage to That in E. coli— 
The DNA sequences of the highly expressed E, coli genes 
otnpk (34). Ipp (35), lufA (36), and tufB (37) each exhibit a 
codon usage which is highly biased toward the same set of 
preferred codons, for five amino acids a completely different 
codon than that which codes for highly expressed yeast pro- 
teins (Table IV). The existence of codon bias for these proteins 
and for the less highly e3q)ressed ribosomal protein genes of 
E. call has been noted previously (34, 35, 11, 38), as has a 
correlation between these codon usage patterns and the abun- 
dance distribution of 35 isoaccepting tRNAs in E. coli (11, 35, 
38). 

The set of preferred codons inferred from the codon usage 
Table IV 

Codon usage summed for the highly expressed E. coli genes ompA 
(34), Ipp (35), tufA (36h and tufB (37) 

Use o f all 
other tri- 

Amino Prefetred codonfs) in £. coli and plets for 
acid use that 

amino 
acid 



Preferred co- 
don (s) in 
yeast 



ecu. GCC 


Ala 


GCC not used; no 








clear preference 




ucu, ucc 


Ser 


UCU, UCC 27 


7 


ACU, ACC 


Thr 


ACU. ACC 53 


4 


GUU» GUC 


Val 


GUU, GUA 62 


8 


AUU, AUG 


He 


AUC43 


4 


GAC 


Asp 


GAC 43 


11 


UUC 


Phe 


UUC 20 


3 


UAC 


Tyr 


UAC 25 


3 


UGU 


Cys 


No clear preference 




AAC 


Asn 


AAC 31 


1 


CAC 


His 


CAC 12 


4 


GAA 


Glu 


GAA 40 


10 


GGU 


Gly 


GGU. GGC 60 


6 


CAA 


Gin 


CAG28 


2 


AAG 


Lys 


AAA 17 


6 


CCA 


Pro 


CCG34 


5 


UUG 


Leu 


CUG56 


2 


AGA 


Aig 


CGU33 


7 



Tablb V 

The Codon Bias Index and cellular protein levels for six E, coli 
genes 

The amounts of ompA protein and lipoprotein were obtained from 
Dr. Masayori Inouye, S. U. N. Y.» Stonybrook,^ and the amounts of 
elongation factor TU and UNA polymerase p subunit from Dr. Patridc 
Dennis, University of British Columbia.^ Hie amounts of ain^ copy 
ribosomal proteins were calculated from the data given by Kjeldgaard 
and Causing (41), using their values for ^ucose and casamino add £. 
coli cultures to odculate the number of ribosomes per cell L7/L12 
was similarly calculated, but is present in four copies per ribosome. 
The amount of Lac repressor per £. coli cell is from MiUler-Hill et ai 
(42). 



E. coli protein 


Codon 
bias 
index 


Amount of the protein 
molecules per cell 


Refer- 
ence to 
codon 
usage 


Mcgor lipoprotein 


0.84 


750,000 


(35) 


Elongation factor TU 








tufA 
tufB 


0.84 
0^1 


[200,000-1 X 10^] 


(36) 
(37) 


ompA protein 


0.78 


200.000 


(34) 


Ribosomal proteins L7/L12 


0^ 


140,000-300,000 


(11) 


Other ribosomal proteins 


0.61 


35.000-75.000 


(38) 


(present one each per ri- 








bosome) 








RNA polymerase fi subunit 


0.53 


7,000-15.000 


(39) 


Lac repressor 


0.18 


10 


(40) 



data for the m^or E. coli outer membrane proteins omtpA and 
lipoprotein and elongation factor TU (Table IV) can be used 
to calculate a codon bias index for other E, coli proteins 
(Table V), exactly as done above in Table HI for yeast 
proteins. The results exhibit a striking correlation between 
the degree of.codon bias and the cellular level of each E, coli 
protein, just as observed in yeast. These results indicate that 
the selective forces which cause greater codon bias for hi^^y 
expressed genes have Uieir effect both in yeast and in E. coli, 
altiiough the maximum bias observed is lower in the bacte- 
rium. 

Effects of tENA Isoacceptor Distribution i^n the Effi- 
ciency of Gene Expression—In differf^ntiated cells of higher 
organisms, changes in tRNA profiles have been observed to 
occur in a tissue-specific manner (29, 43, 44). To the extent 
that the resulting isoacceptor distribution matches the codon 
usage bias of abundant cell-specific mRNAs, the output of 
migor proteins may be maximized and expression of minor 
RNAs maintained by mechanisms such as &ose discussed 
earlier. 

The cloning of such a developmentally regulated higher 
organism gene and attempts to achieve its e]q>ression at a 
hi^ level either in £. coU (45) or in yeast (46) are accompa- 
nied in effect by a dedifferentiation of the tRNA population 
to that which is characteristic of the host ceU. It fiodlows firom 
the correlations i^ch we and others (35, 36) have shown 
between abundant tRNAs and heavily used codons and from 
the differences in codon usage between E. coli and yeast 
(Table IV) that certain cloned genes may be more readily 
expressed in E, coli and others in yeast For five armo adds 
(bottom of Table IV), the difference in biased codon usage 
between yeast and E, coli is very large indeed. For example, 
the codon AGA is used to code for 100% of the arginine 
residues in the most abundant yeast proteins (Table 11), while 
in the three abundant £, coli proteins, CGU comprises 83% 
and COG 17% of the arginine codon usage. Higher eukaryote 
mRNAs differ greatly from one an ther in aiginin codon 
usage, with AGg being preferred for valbumin, fi gl bin, 
inmiunoglobulin (2), and interferon (47) mRNAs but GGN 



* M. Inouye, personal communication. 
^ P. Dennis, personal communication. 
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codons preferentially used in histone mRNAs and mRNAa 
coding for several mammHlian polypeptide hoim nes (2). 
Hiese oonsid rati ns suggest tiiat hig^ efiicient translati n 
f cloned mammalian genes in microbial cells may require 
careful comparison of tiie codon usage of tiie gene in relation- 
ship to tiie codon pref rences of each availabl host cell 
e^ystem. Certain genes may best be expressed in yeast, otiiers 
in E. colL 
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